Decoding depression: a comprehensive multi-cohort exploration of blood DNA methylation using machine learning and deep learning approaches

The causes of depression are complex, and the current diagnosis methods rely solely on psychiatric evaluations with no incorporation of laboratory biomarkers in clinical practices. We investigated the stability of blood DNA methylation depression signatures in six different populations using six public and two domestic cohorts (n = 1942) conducting mega-analysis and meta-analysis of the individual studies. We evaluated 12 machine learning and deep learning strategies for depression classification both in cross-validation (CV) and in hold-out tests using merged data from 8 separate batches, constructing models with both biased and unbiased feature selection. We found 1987 CpG sites related to depression in both mega- and meta-analysis at the nominal level, and the associated genes were nominally related to axon guidance and immune pathways based on enrichment analysis and eQTM data. Random forest classifiers achieved the highest performance (AUC 0.73 and 0.76) in CV and hold-out tests respectively on the batch-level processed data. In contrast, the methylation showed low predictive power (all AUCs < 0.57) for all classifiers in CV and no predictive power in hold-out tests when used with harmonized data. All models achieved significantly better performance (>14% gain in AUCs) with pre-selected features (selection bias), with some of the models (joint autoencoder-classifier) reaching AUCs of up to 0.91 in the final testing regardless of data preparation. Different algorithmic feature selection approaches may outperform limma, however, random forest models perform well regardless of the strategy. The results provide an overview over potential future biomarkers for depression and highlight many important methodological aspects for DNA methylation-based depression profiling including the use of machine learning strategies.


Supplementary methods 2.2.1 PSY screening and recall
The PSY cohort is composed of adolescents from Uppsala, Sweden that were screened for psychiatric disorders with self-administered questionnaires.The depression status of the participants was characterized utilizing the computer-based Development and Well-Being Assessment (DAWBA) questionnaire [1].A depression band of the questionnaire was used for psychiatric evaluation.This band represents a numeric score ranging from 0 to 5, where each number corresponds with a probability that an individual has depression.These scores have the following probabilities: 0 (<0.1%), 1 (~0.5%), 2 (~3%), 3 (~15%), 4 (~50%), and 5 (>70%) [1].In this work, all individuals with scores higher than three were classified as the "depressed" group.Participants for the study were recruited in different waves, including the screening phase, and a recall phase >1 year after when participants were invited either to come for follow-up or join the study from the start.DNA methylation data at screening is available for 221 participants.DNA methylation data for 91 participants was available exclusively at the recall stage, not overlapping with screening.The DNAm data was obtained from the whole blood sample and DNAm was assessed with the Illumina Infinium HumanMethylation450 array at screening and Human MethylationEPIC at recall.The extensive information regarding sample collection, DNA extraction, and handling is available elsewhere [2].

MPIP cohort 1 and MPIP cohort 2
The cohort GSE125105_MPIP is an epigenome-wide methylation analysis conducted at the Max Planck Institute of Psychiatry, Munich, Germany.Depression status was assessed in the clinical settings according to the Diagnostic and Statistical Manual of Mental Disorders (DSM) IV criteria.Detailed information about cohort, sample preparation, and procedures related to methylation profiling are available in the initial publications [3][4][5][6].All 699 participants were included in the analysis.The cohort GSE74414_MPIP2 was also recruited at the Max Planck Institute of Psychiatry.The study design included an investigation of DNA methylation changes in the context of glucocorticoid receptor stimulation with dexamethasone in MDD cases and controls [7].Depression was characterized by the 21-item Hamilton Depression Rating Scale .Patients having at least a moderate depressive episode (HAMD-21 ≥ 14) were classified as depressed.For this work, we used baseline subjects prior to stimulation.According to the information from the study investigator, one participant was overlapping with GSE125105 MPIP, and thus was excluded from the analysis.The resulting sample contained 32 depression "cases" and 49 controls.
DNAm data has been obtained from the whole blood with the Illumina Infinium HumanMethylation450 array in both cohorts.

GRADY cohort
The cohort GSE72680_GRADY has been created as a result of the Grady Trauma Project study in Atlanta, U.S.This sample includes 422 participants with the objective to investigate associations between genetic and environmental factors with stress.Participants primarily had African-American ethnic backgrounds and came from urban areas with low socioeconomic status [3,8].DNAm data has been obtained from the whole blood with the Illumina Infinium HumanMethylation450 array.The cohort was characterized by multiple psychiatric questionnaires, and depression, in particular, was evaluated with the commonly used Beck Depression Inventory (BDI) [9].BDI represents a numeric score ranging from 0 to 63, where higher numbers indicate stronger depression symptoms.To stratify participants into depressed and non-depressed groups, we used a composite procedure.First, the BDI score was taken into consideration.We used a strict BDI threshold of 21 for the classification of depression [10], as scores 17-20 could be considered as borderline [11].Second, we also considered if the participant was under treatment for depression (as this data was available).Thus, the overall procedure included several steps: 1. Participants with BDI ≥ 21 were considered as depressed.
2. Participants undergoing depression treatment were considered as depressed.
3. Participants with BDI < 21 and missing data on depression treatment were excluded from the analysis.

4.
Participants not undergoing depression treatment with missing BDI data were excluded from the analysis.

All other participants were classified as non-depressed.
The resulting data included 212 cases and 179 controls.

Royal Devon and Exeter cohort
The cohort GSE113725_RDE has been collected by the Royal Devon and Exeter (RDE) Tissue Bank, part of the NIHR Exeter Clinical Research Facility in the UK.The initial study was to investigate associations between the history of depression and inflammation in the landscape of DNA methylation [12].Depression history was self-reported by positively answering a question: 'Has a doctor ever diagnosed you with depression requiring regular medical treatment?'.For the purpose of this work, we only used samples with a history of depression (n=49) and controls (n=48).DNAm was assessed with the Illumina Infinium HumanMethylation450 array.

Molecular Biomarkers of Antidepressant Response Cohort
The cohort GSE198904_DHRC was conducted by McGill University and Douglas Mental Health University Institute in Canada [13,14].Individuals were classified as depressed if they had a diagnosis of a current major depressive episode as per the SCID-I as well as a HAMD-21 score ≥ 20.The control samples were recruited through advertisement.
In the initial dataset, many samples were obtained from the same participants, and thus were excluded prior to the analysis.Two participants were excluded due to missing phenotype information.The curated dataset contained 32 control participants, whereas 186 individuals were categorized as cases.Methylation profiling was performed in the whole blood with Illumina Human MethylationEPIC.

Observational clinical study cohort NCT02489305
The cohort GSE198904_OBS is a part of the observational clinical study OBSERVEMDD0001 (NCT02489305) from Janssen Research & Development with the objective to identify blood parameters to predict the near-term relapse of MDD across several locations in the U.S. [15] "Post-depression" patients were classified by having nonpsychotic, recurrent MDD (DSM-V) within the past 24 months, Montgomery Asberg Depression Rating Scale (MADRS) total score ≤ 14, and having an evidence for recent response (within 3 months) to oral antidepressant treatment.The non-affected controls were classified according to self-reported data.Similar to the previous cohort, many samples were obtained from the same individual, and thus were excluded prior to the analysis.After exclusion, the resulting sample contained 115 cases and 29 controls.Methylation profiling was also performed in the whole blood with Illumina Human MethylationEPIC.

DNA methylation data preprocessing
DNAm data for PSY and GSE125105_MPIP was available in the form of raw IDAT files.DNAm data for other cohorts was available from raw BeadStudio intensities in a CSV format.The R package minfi was used for data loading and preprocessing and values from IDAT files or CSV files with signals were loaded in R [16].Methylation intensities were corrected for background using the noob method in PSY and GSE125105_MPIP resulting in MethylSet class object [17].The noob correction was not performed in other cohorts as imported data already represented MethylSet and background correction with minfi was not possible.Subsequently, extracted beta values were quantile normalized and corrected for type I and type II probe bias via Beta Mixture Quantile Dilation [18] from the watermelon R package [19].The normalized beta values then passed through the series of filtering steps.
Namely, signals from sex chromosomes as well as probes having a SNP with minor allele frequency >5% within the entire probe sequence or probes with SNPs at a CpG site or at single based extension were discarded.Additionally, all cross-reactive probes identified by Chen et al. [20] and Benton et al [21] were further removed.We filtered participants that had less than 75% of probes with detection p-values < 0.00005 (none were excluded).CpG sites that had less than 75% of samples with detection p-values < 0.01 were removed.Bead batch effect correction was performed with ComBat from sva R package.We used a minfi-based implementation of the Houseman algorithm to estimate heterogeneity of white peripheral blood cells [22][23][24].We used a regression-based method to adjust methylation values for cell-based heterogeneity [25].Briefly, methylation value is regressed against estimated cell proportions using a linear model.The obtained residuals (that represent the variance unexplained by cell proportions) are then added to the mean value for a CpG to obtain cell-type adjusted methylation intensity.As the data for methylation was coming from two different arrays, we only performed the analysis for overlapping CpG sites.Thus, every Both, and M-values could be used for statistical analyses, however M-values β empirically showed better performance in specific tests [26].In the present work, the application of either or M-values was considered as a hyperparameter.Prior to the analysis, β the methylation values were ordered according to their genomic coordinates (cumulative positions) and only CpGs that passed QC in all data batches were kept (304,765 CpGs).
Individual cohort-batches were concatenated together to form a single large batch-level processed dataset.Since methylation values in every cohort were centered around different means and showed inconsistent variance, the pooled dataset was further quantile normalized, and stabilized with ComBat (cohort of origin was used as batch).The resulting data represents a harmonized dataset.We compared depression classifications with and without data harmonization.The quality of cohort-batch correction was inspected by PCA on hypervariable CpGs with difference in beta values > 0.2 between 97.5 and 2.5 percentiles of corresponding beta values.PCA was performed with the R package FactoMineR.

Differential methylation analyses
Differential methylation analyses were performed within the R environment.We used both pooled analysis of DNA methylation (data from all cohorts is merged into one) as well as meta-analysis of individual cohorts.The pooled analysis was performed on the harmonized dataset (described above).Differential methylation was assessed using R package limma based on linear models with T statistics being moderated by an empirical Bayes framework that is commonly used for microarray data [27].As DNA methylation and depression could be affected by many potential confounders, we included the age and sex of the participants as covariates in all models since only these variables were available for all samples.In the linear models, depression status (case) was used as the main predictor, covariates included age (numeric), sex (binary), and study factor (8 levels), whereas methylation at a corresponding CpG was treated as a dependent variable.For nominally significant CpGs, we calculated a directional agreement index that corresponds to a fraction of cohorts, where the difference in median methylation in cases vs median methylation in controls had the same sign.
For the meta-analysis of individual cohorts, we used data before harmonization (normalized only at a batch/cohort level).We have performed limma-based modeling on individual cohorts, where models included the same parameters but without a study factor (8 levels).For every nominally significant CpGs in at least one cohort, we calculated the number of occurrences for being nominally significant, significant at the false discovery rate of 5% (FDR), and the directional agreement index for FDR and nominal significances.We performed a meta-analysis of estimated log2 fold changes (Log2FC) for all nominally significant CpGs.We considered this as a minimal requirement and potential indication that such a CpG would have a non-zero effect in the meta-analysis.The limma-estimated Log2FC and corresponding standard errors from eight cohorts were used to fit linear random-effects models with R package metafor (function rma.uni) for every nominally-significant CpG.
Standard errors (SE) in limma for log2 fold changes were estimated according to a package developer as specified in the package support forum: https://support.bioconductor.org/p/70175/.The rma.uni used log2 fold changes (vector of 8 values) and corresponding sampling variances (SE 2 , vector of 8 values) as input per CpG.
The heterogeneity was estimated using Sidik-Jonkman estimator [28,29] since the default τ 2 typically better performing restricted maximum likelihood (REML) was not converging for some of the probes even after 5000 iterations.Study weights were estimated with the default inverse-variance method.Modeling was performed individually as one CpG at a time.
P-values estimated in the meta-analysis were adjusted using the false discovery rate (FDR) method.

Sensitivity analysis and functional analyses
As the initial phenotypic data does not have reported smoking status, we performed estimation of the smoking score from methylation intensities (before harmonization) to see which proportion of participants could be smokers in the analyzed dataset.We used the R package EpiSmokEr [30] to perform estimation of the smoking score based on the methodology initially proposed by Elliott et al [31].We used two score thresholds, both 17.55 (for European population) and 11.79 (for Asian population) as reported in the initial paper [31].The first score threshold should be applied as the present study uses cohorts from populations with primarily European ancestry.However, we also compared the score to the threshold in the Asian population to see what could be the highest expected smoking level (as % of study sample) if the score is strict.
The obtained list of overlapping nominally significant CpGs from pooled analysis and meta-analysis was subject to gene ontology enrichment analysis.We used R package missMethyl to perform enrichment analysis that takes into account a gene-length bias and multi-mapped CpGs [32].We also performed enrichment of CpGs considering them as expression quantitative trait methylation (eQTMs) at FDR < 5% based on the data from the BIOS QTL browser (https://molgenis26.gcc.rug.nl/downloads/biosqtlbrowser/).The initial annotation was modified so genes associated with CpGs were replaced by genes from BIOS QTL.The resulting enrichment was performed as unbiased since mapping to genes was based on expression association rather than genomic position/array design.
The obtained list of overlapping nominally significant CpGs was also mapped to chromatin regulatory elements identified by Roadmap Epigenomics Consortium [33].
Mapping was performed based on two cell/tissue types and included E062 primary mononuclear cells (blood) and E073 prefrontal cortex.A blood sample was used to represent the same sample source as in the current study.Prefrontal cortex was used as it has been consistently linked to depression biology [34].After every CpG was mapped to a corresponding regulatory element (per tissue), we performed an enrichment analysis of mappings obtained from 1987 CpGs in comparison to a mapping from all CpGs included in the study (304,765).The enrichment was performed, using R package clusterProfiler, and was done separately for up-regulated and down-regulated probes.

Feature selection for depression classification
Since the number of initial features is magnitudes times the number of the participants, we performed a feature selection procedure before training classification models.
First, we generated a list of 200 CpGs that represents the Top 200 CpGs from pooled differential methylation analysis with consistent direction for association in all cohorts.This list has a positive predictive bias and was used to evaluate the "maximal theoretical performance" of classifiers.To evaluate performance in an unbiased manner, we implemented the feature selection procedure within the model training process.In this setting, we used limma to identify the top 200 differentially methylated CpGs obtained exclusively from the training sets (generated within each iteration of cross-validation or the entire validation dataset in the final testing).Linear models in limma corresponded to the models used in the pooled differential analysis.For regularized logistic regression models, we also evaluated performance using the Top 10 000 CpGs from limma as such models assign zeros to many coefficients and could be used to select the most relevant features.
We compared limma with alternative algorithm-based feature selection strategies implemented within the training of models (unbiased feature selection).These strategies were based on built-in functions of the scikit-learn [35]

Machine learning classifiers
We explored the possibility of applying DNA methylation for depression classification with DL and ML classifiers.The python module scikit-learn [35] was used as a source for ML classifiers.The models included binary logistic regression (no regularization), "ridge" logistic regression ("L2" regularization), "lasso" logistic regression ("L1" regularization), elastic net logistic regression ("L1L2" regularization), decision tree, random forest, support vector machines (SVMs) with linear or radial basis function (RBF) kernels, and AdaBoost.The classifiers were primarily used with default parameters with a few exceptions: 1) In SVMs, the kernel type was explicitly specified (either RBF or linear).As SVMs are very sensitive to hyperparameters, we performed a grid-search of a C parameter in SVMs with linear kernel and a search of C and gamma in SVMs with RBF kernel.The models were optimized for performance on M-values in harmonized cross-validation dataset.The best configuration with a linear kernel included a model with C=1, whereas the best model with RBF had the following specifications: C=1, gamma=0.25.
2) In the logistic regressions, the maximal number of iterations was set to 5000 to enable solver convergence, the solver, in turn, was set to "saga", and the elastic net L1 ratio was set to 0.5.

Standalone deep learning classifiers
DL models were selected based on the input tensor properties as well as on the previous architectures that were applied in similar domains of applications.We tried nine different versions of small deep neural network classifiers Figure S1.In all models, the first layer represents a 1D tensor with shape (batch, N), where N=200 is the number of selected CpGs.The last layer in all implementations represents a single node with sigmoid activation.
Between the input and output layers, we tried different combinations of hidden layers with regularizations, batch normalization and/or dropout.We also tried different combinations of layer activations.The loss function for all classifiers was represented by a binary cross-entropy.

Joint autoencoder-classifier models
We investigated the applicability of depression classification with encoded DNA methylation data as was proposed in other areas [36][37][38].These approaches imply the encoding of the DNA methylation data into hidden space with autoencoders before the classification.The training of model components is performed jointly.In these models, the latent space is used both for reconstruction and classification.The autoencoder component of models was implemented either as a fully-connected autoencoder or a variational autoencoder (VAE).Each autoencoder consists of an encoder part and a decoder part.A structure of fullyconnected autoencoders was based on a sequence of fully-connected layers with N, 128, 64, nodes for encoder, and 64, 128, N nodes for a decoder.The bottleneck is represented by a fully-connected layer with 32 nodes.
The VAE is the type of autoencoder that models a hidden dimension to conform to a specified distribution (usually Gaussian) [39].The encoder part of VAE included the input layer, followed by fully-connected layers with 128 and 64 nodes respectively.Since VAE models hidden dimension as a normal distribution, the next two layers are specified to perform reparametrization and model the mean and logarithm of variance , which then are used to generate a sample from a normal distribution with a sampling layer defined as where is a sample from a normal distribution (noise variable) and is the generated latent ε  space.The bottleneck is represented by a sampling layer with 32 nodes.The decoder part of VAE matches the one from the fully-connected autoencoder.
Reconstruction loss functions of all autoencoder types were dependent on the input, and mean squared error loss was used for M-values, whereas binary cross-entropy was used for beta-values as defined below In both equations, N corresponds to batch size, and denotes a real methylation intensity,   whereas denotes the reconstructed methylation intensity.ĵ

𝑖
The classifier part of all models represented a small fully-connected neural network with dropout, regularization, batch normalization, and activations treated as hyperparameters.The last node of the classifier had either sigmoid or hyperbolic tangent activation for binary cross-entropy and squared hinge losses, respectively.The structure of all final configurations of DL models is shown in Figure 2. The classification loss (L clf ) was primarily represented by a binary cross-entropy: where N corresponds to batch size, denotes a real class, whereas denotes a predicted     class.In some architectures, we additionally tested a squared hinge loss as a classification loss.The total loss for fully-connected autoencoder-classifier was set as a weighted average of the two loss-functions: where and treated as a hyperparameter.
The VAE, however, requires to have an additional loss function to ensure that hidden dimension conforms to a normal distribution.This loss is derived from the  Kullback-Leibler divergence and formulated by the following equation: where N represents the batch size, and represent parameters for the sampling ( Model training and optimizations and preliminary performance evaluations were conducted on a cross-validation (CV) dataset.This dataset included ~84% of the entire data and was composed of five out of eight cohort-batches (GSE125105_MPIP, GSE198904_DHRC, GSE72680_GRADY, PSY_SCR, GSE113725_RDE).The independent test set was not used for model optimizations and included all data from the remaining three cohorts (PSY recall, GSE74414_MPIP2, and GSE198904_OBS).PSY recall and GSE74414_MPIP2 were included in the test as these cohort-batches represent a population analogue of the respective larger cohorts used in training (PSY_SCR and GSE125105_MPIP).The cohort GSE198904_OBS was allocated to the test set so its total number is ~15% (standard for independent test samples).Data allocation for the independent test sets was strictly based on cohort-batches so cohort-level preprocessing and technical batch (based on physically discrete Illumina BeadChips [40]) is not leaked between test set and CV-set.
The optimizer Adam [41] was used in the training of DL models.The hyperparameter searches were performed with a single 3-fold cross-validation on the CV dataset using averaged hold-out subset AUCs.The evaluation of best-performing classification models was performed via 10 repetitions of 3-fold cross-validation on the same CV dataset using averaged statistics for hold-out CV subsets.The last model evaluation has been performed on a separate hold-out test set.All DL models were implemented in tensorflow2, ML models were implemented in scikit-learn, and analyses were conducted with bash, python, and R (limma-based feature selection).The source code for all stages of the work is publicly-available at https://github.com/AleksandrVSokolov/depression_ML_DL

Figure S4
This figure shows the volcano plot for pooled differential methylation analysis with limma.None of the probes passed the correction with the false discovery rate < 5%.In the figures, a threshold of 0.05 for log 2 fold change was set for illustrative purposes.Blue horizontal lines indicate a threshold for nominal significance log 2 (0.05).

Descriptions for Data S2 to S9
Data S2 -This table displays information on the final dataset, included cohort, and main demographics used in the analysis.Categorical variables are shown as counts and respected percent proportions.Numeric variables are shown as mean ± standard deviation, respected minimum, and maximum.
Data S3 -This table displays the results of pooled differential methylation analysis using limma for probes with a nominal p-value < 0.05.Cohort agreement was calculated per CpG by comparing the respective median methylation of "cases" to the respective median methylation of "controls."A positive value indicates that the median in cases was higher than in controls, whereas a negative value indicates the opposite.The prevailing direction is presented in the table.The agreement index was calculated by dividing the number of cohorts with a prevailing direction for association by the total number of cohorts.
Data S4 -This file shows the list of Top 200 CpGs with consistent directions for associations in all cohorts (median difference) from Data S2 that was used for classifier evaluation with positive performance bias.
Data S5 -This table shows results for differential methylation analysis with limma performed separately in individual cohorts for probes with a p-value nominal < 0.05.Log 2 FC is shown as a 95% confidence interval.Standard errors for Log 2 FC were calculated using limma.
Data S6 -This table contains significance statistics for individual CpG sites (p-value nominal < 0.05 in at least one cohort) as well as an associated meta-analysis.Nominal.posindicates the number of positive (Log 2 FC "Cases" vs "Controls") associations that passed the nominal significance threshold.Nominal.negindicates the number of negative (Log 2 FC "Cases" vs "Controls") associations that passed the nominal significance threshold.Nominal score was calculated as (Nominal.pos+ Nominal.neg)/8.Nominal.agreement was calculated as max(Nominal.pos,Nominal.neg)/(Nominal.pos+ Nominal.neg).Nominal count equals Nominal.pos+ Nominal.neg.Statistics for FDR express the same but for FDR-significant CpGs.meta_LFc indicates meta-calculated Log 2 FC for a CpG using linear random effects model.Meta_se indicates the standard error, meta_pval indicates the associated p-value nominal for estimated meta_LFc.Other column show the following: tau2 -τ2 statistic or estimated amount of (residual) heterogeneity, I2 -I 2 statistic, H2 -H2 statistic, Q -test statistic of the test for (residual) heterogeneity, Q.p -p-value for Q. meta_FDR indicates meta_pval adjusted with false discovery rate correction.Meta-analysis statistics were calculated using the R package metafor Data S7 -1) This table shows gene ontology (biological processes) enrichment analysis of 1987 overlapping CpGs taking into the account gene length bias and probes annotated to multiple genes 2) This table provides a list of CpG-Transcript (eQTM) associations in whole blood at FDR < 5% from BIOS QTL browser.3) This table shows affected gene and the number of associated CpG sites from eQTM table 4) This table shows gene ontology (biological processes) enrichment analysis of 1987 overlapping CpGs where each CpG was mapped to a gene through eQTM association but not through its coordinates.5) This table shows overrepresentation analysis for chromatin regulatory elements based on the Roadmap Epigenomics Project.State names and description are provided based on the information here: https://egg2.wustl.edu/roadmap/web_portal/chr_state_learning.html#core_15state.
Analysis considered two tissues separately: E073 Pre-frontal cortex, E062 Primary mononuclear cells (blood).Data S8 -These tables show results for evaluation of classification models in the cross-validation tests and evaluation in the final hold-out set using data after harmonization Data S9 -These tables show results for evaluation of classification models in the cross-validation tests and evaluation in the final hold-out set using data without harmonization Data S10 -These tables show results for evaluation of classification models and feature selection strategies in the cross-validation tests and evaluation in the final hold-out set.All evaluations were performed on M-values using both harmonized data and data without harmonization (batch-level data).
Figure S5 (provided as separate PDF) participant in each cohort was characterized by a 1D methylation tensor consisting of 304765 methylation values.DNAm intensity could be represented as either beta-values ( ) or as β ∈ [0, 1] M-values ( ), where R denotes real numbers.The equations below denote the  ∈  calculation of beta-values and M-values respectively from the raw signal intensifies.The 100 represents Illumina's constant to avoid divisions by 0.
above.The total loss for VAE classifier, in turn was formulated as a sum of reconstruction loss, , and classification loss scaled by a parameter .as a hyperparameter. ∈ [0, + )2.10Model training and evaluationModel training and hyperparameter searches were either performed at the local computer with NVIDIA RTX A5000 GPU or at the dedicated server nodes (with two NVIDIA A100 GPUs) provided by the National Academic Infrastructure for Supercomputing in Sweden (NAISS) at Uppsala Multidisciplinary Center for Advanced Computational Science (UPPMAX).

Figure S5 .
Figure S5.This figure shows estimated smoking scores using R package EpiSmokEr.We used data without harmonization (batch-level data).Smoking scores were calculated individually in each cohort-batch, using CpGs that passed quality control steps.Blue line indicates a threshold of 11.79, whereas a red line shows the threshold of 17.55.Both thresholds are from the original publication by Elliott et al.

Figure S6 .
Figure S6.This figure shows the performance of classifiers in relation to feature selection strategies.Y-axis indicates AUC score calculated in the hold-out set for each fold of 3-fold cross-validation.X-axis indicates a feature selection strategy.The first column corresponds to the feature selection by limma.Facet axis indicates a classifier model (final configuration).All estimations were performed using 10 repetitions of 3-fold cross validation.Methylation intensity was used as M-values.Each AUC score corresponds to individual fold, and thus every combination of the classifier model and feature selection strategy is represented by 30 AUC values.